From reads to operational taxonomic units: an ensemble processing pipeline for MiSeq amplicon sequencing data
نویسندگان
چکیده
The development of high-throughput sequencing technologies has provided microbial ecologists with an efficient approach to assess bacterial diversity at an unseen depth, particularly with the recent advances in the Illumina MiSeq sequencing platform. However, analyzing such high-throughput data is posing important computational challenges, requiring specialized bioinformatics solutions at different stages during the processing pipeline, such as assembly of paired-end reads, chimera removal, correction of sequencing errors, and clustering of those sequences into Operational Taxonomic Units (OTUs). Individual algorithms grappling with each of those challenges have been combined into various bioinformatics pipelines, such as mothur, QIIME, LotuS, and USEARCH. Using a set of well-described bacterial mock communities, state-of-the-art pipelines for Illumina MiSeq amplicon sequencing data are benchmarked at the level of the amount of sequences retained, computational cost, error rate, and quality of the OTUs. In addition, a new pipeline called OCToPUS is introduced, which is making an optimal combination of different algorithms. Huge variability is observed between the different pipelines in respect to the monitored performance parameters, where in general the amount of retained reads is found to be inversely proportional to the quality of the reads. By contrast, OCToPUS achieves the lowest error rate, minimum number of spurious OTUs, and the closest correspondence to the existing community, while retaining the uppermost amount of reads when compared to other pipelines. The newly introduced pipeline translates Illumina MiSeq amplicon sequencing data into high-quality and reliable OTUs, with improved performance and accuracy compared to the currently existing pipelines.
منابع مشابه
Fast and Simple Analysis of MiSeq Amplicon Sequencing Data with MetaAmp
Microbial community profiling by barcoded 16S rRNA gene amplicon sequencing currently has many applications in microbial ecology. The low costs of the parallel sequencing of multiplexed samples, combined with the relative ease of data processing and interpretation (compared to shotgun metagenomes) have made this an entry-level approach. Here we present the MetaAmp pipeline for processing of SSU...
متن کاملA novel conceptual approach to read-filtering in high-throughput amplicon sequencing studies
Adequate read filtering is critical when processing high-throughput data in marker-gene-based studies. Sequencing errors can cause the mis-clustering of otherwise similar reads, artificially increasing the number of retrieved Operational Taxonomic Units (OTUs) and therefore leading to the overestimation of microbial diversity. Sequencing errors will also result in OTUs that are not accurate rec...
متن کاملImproved OTU-picking using long-read 16S rRNA gene amplicon sequencing and generic hierarchical clustering
BACKGROUND High-throughput bacterial 16S rRNA gene sequencing followed by clustering of short sequences into operational taxonomic units (OTUs) is widely used for microbiome profiling. However, clustering of short 16S rRNA gene reads into biologically meaningful OTUs is challenging, in part because nucleotide variation along the 16S rRNA gene is only partially captured by short reads. The recen...
متن کاملLotuS: an efficient and user-friendly OTU processing pipeline
BACKGROUND 16S ribosomal DNA (rDNA) amplicon sequencing is frequently used to analyse the structure of bacterial communities from oceans to the human microbiota. However, computational power is still a major bottleneck in the analysis of continuously enlarging metagenomic data sets. Analysis is further complicated by the technical complexity of current bioinformatics tools. RESULTS Here we pr...
متن کاملFrom sequencing reads to microbial Diversity: bioinformatic Algorithms for Processing amplicon sequencing Data
has revolutionized the field of microbial ecology by offering a cost-efficient method to assess microbial diversity at an unseen depth using 16S rRNA amplicon sequencing approaches. Different preprocessing algorithms need to be performed to obtain a collection of highly reliable sequencing reads, ending with a clustering step to group them into Operational Taxonomic Units (OTUs) However, this a...
متن کامل